We are now entering the phase of doing serious statistics: we run regression models, inspect them, plot, and tabulate them.

For this exercise, let’s again pretend to be researchers interested in the determinants of the subjective likelihood estimation of being treated for COVID-19 in a hospital. We suspect that age plays an important role, something we have already assessed by visualizing a similar relationship. So… we need the GESIS Panel COVID-19 data again:

library(dplyr)
library(haven)
library(sjlabelled)

gp_covid <-
  read_sav(
    "../data/ZA5667_v1-1-0.sav"
  ) %>% 
  set_na(na = c(-1:-99, 97)) %>% 
  mutate(
    likelihood_hospital = hzcy003a,
    age_cat = as.factor(age_cat)
  ) %>% 
  remove_all_labels()

We start our analysis by checking the relationship with an ANOVA. We also include some very essential covariates: gender, political orientation, and marital status.

1

Run an ANOVA on the proposed relationship between the subjective likelihood of treatment in a hospital and age. Also, include all the covariates.
We had a similar analysis before… you can simply add the control variables with +.

Alright, there’s definitely something in the data we should investigate. Sometimes, it is considered bad practice, but we could try to make a median cut for the dependent variable and run logistic regression.

2

Make a median cut for the variable “likelihood_hospital” and run a logistic regression with all variables. What do you see in the results?
You can create new variables fairly easily with the function dplyr::mutate() in combination with the ifelse() function.

Running one model is not enough. It may be that the assumptions for logistic regression are not fulfilled, or a reviewer simply doesn’t like these types of regressions. Instead, she proposes to run a binomial regression but with a cauchit link.

3

Run the same model with the sole difference of using a cauchit link. How would you interpret the different regression coefficients?
Have a look at the help page ?family to see how you can include a cauchit link.

Ok, using different link functions is sometimes done as they provide different model fits. That’s definitely something we should investigate as well.

4

Compare both regression models with an ANOVA. Use the option test = "LRT" to perform a likelihood ratio test. What’s your interpretation?
A p-value considered as statistically significant would indicate a difference between the models.